Fix text_replace() silently corrupting source files containing \?\ on Windows#462
Open
AidanShipperley wants to merge 1 commit intoconda:mainfrom
Open
Conversation
… Windows Replace blanket strip of \?\ and //?/ byte sequences from all text data with targeted replacement that only matches extended-length prefixes immediately followed by the placeholder path. The previous approach (introduced in PR conda#432 for issue conda#398) would unconditionally remove all \?\ and //?/ occurrences from text files, corrupting legitimate source code that references those patterns (e.g., huggingface_hub's file_download.py). The new approach replaces extended-prefix + placeholder as a unit FIRST, then does the standard placeholder replacement. This correctly handles: - \?\C:\envs\myenv -> replaced cleanly - //?/C:\envs\myenv -> replaced cleanly - C:\envs\myenv -> standard replacement - Source code with \?\ (not adjacent to placeholder) -> preserved Fixes conda#461
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
\?\and//?/byte sequences from all text data intext_replace()with targeted replacement that only matches extended-length prefixes immediately followed by the placeholder path\?\(e.g.,huggingface_hub/file_download.py) while still correctly handling dangling extended-length prefixes left aftercore.pynormalizationRoot Cause
PR #432 (released in 0.9.0) fixed #398 by stripping
\?\and//?/from all text file data on Windows to clean up dangling extended-length path prefixes:While this successfully addressed the original issue, it also inadvertently affects any
.pyfile containing those byte sequences as legitimate source code (e.g.,"\\?\\"in Python string literals), causingSyntaxErrorat import time.Fix
Replace with targeted replacement that only strips extended-length prefixes when immediately followed by the placeholder:
This handles all three path cases without touching unrelated source code:
\?\C:\envs\myenv-> replaced cleanly (extended-prefix + placeholder as unit)//?/C:\envs\myenv-> replaced cleanly (extended-prefix + placeholder as unit)C:\envs\myenv-> standard replacement\?\(not adjacent to placeholder) -> preservedTesting
test_windows_extended_length_path_cleanupto verify targeted replacement behaviortest_windows_extended_length_path_with_placeholder_replacementfor targeted behaviortest_windows_multiple_extended_length_patternsto verify preservation of non-placeholder patternstest_windows_source_code_not_corrupted: regression test for issue conda-pack 0.9.0+ silently corrupts Python source files containing\\?\on Windows #461 (huggingface_hub case)test_windows_targeted_extended_prefix_replacement: verifies both\?\and//?/prefix stylestest_windows_mixed_extended_and_standard_paths: tests mixed extended and standard pathsTestUpdatePrefixtests are unchanged and should still passFixes #461